首页> 外文OA文献 >Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search
【2h】

Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

机译:学习自主飞行器的深度控制策略   mpC指导的政策检索

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Model predictive control (MPC) is an effective method for controlling roboticsystems, particularly autonomous aerial vehicles such as quadcopters. However,application of MPC can be computationally demanding, and typically requiresestimating the state of the system, which can be challenging in complex,unstructured environments. Reinforcement learning can in principle forego theneed for explicit state estimation and acquire a policy that directly mapssensor readings to actions, but is difficult to apply to unstable systems thatare liable to fail catastrophically during training before an effective policyhas been found. We propose to combine MPC with reinforcement learning in theframework of guided policy search, where MPC is used to generate data attraining time, under full state observations provided by an instrumentedtraining environment. This data is used to train a deep neural network policy,which is allowed to access only the raw observations from the vehicle's onboardsensors. After training, the neural network policy can successfully control therobot without knowledge of the full state, and at a fraction of thecomputational cost of MPC. We evaluate our method by learning obstacleavoidance policies for a simulated quadrotor, using simulated onboard sensorsand no explicit state estimation at test time.
机译:模型预测控制(MPC)是控制机器人系统(尤其是自动飞行器,如四轴飞行器)的有效方法。但是,MPC的应用可能需要进行计算,并且通常需要估计系统的状态,这在复杂的非结构化环境中可能具有挑战性。原则上,强化学习可以放弃显式状态估计所需的知识,并且可以获取将传感器读数直接映射到动作的策略,但是很难应用于不稳定的系统,这些系统在找到有效的策略之前可能会在训练期间遭受灾难性的失败。我们建议在指导性策略搜索的框架中将MPC与强化学习相结合,其中在仪器化培训环境提供的完整状态观察下,MPC用于在培训时间生成数据。该数据用于训练深度神经网络策略,该策略仅允许访问来自车辆车载传感器的原始观测值。训练后,神经网络策略可以在不了解完整状态的情况下成功控制机器人,而其成本仅为MPC的一小部分。我们通过使用模拟机载传感器学习模拟四旋翼的避障策略来评估我们的方法,并且在测试时没有明确的状态估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号